A generalised deep learning-based surrogate model for homogenisation utilising material property encoding and physics-based bounds

The use of surrogate models based on Convolutional Neural Networks (CNN) is increasing significantly in microstructure analysis and property predictions. One of the shortcomings of the existing models is their limitation in feeding the material information. In this context, a simple method is developed for encoding material properties into the microstructure image so that the model learns material information in addition to the structure-property relationship. These ideas are demonstrated by developing a CNN model that can be used for fibre-reinforced composite materials with a ratio of elastic moduli of the fibre to the matrix between 5 and 250 and fibre volume fractions between 25 and 75%, which span end-to-end practical range. The learning convergence curves, with mean absolute percentage error as the metric of interest, are used to find the optimal number of training samples and demonstrate the model performance. The generality of the trained model is showcased through its predictions on completely unseen microstructures whose samples are drawn from the extrapolated domain of the fibre volume fractions and elastic moduli contrasts. Also, in order to make the predictions physically admissible, models are trained by enforcing Hashin–Shtrikman bounds which led to enhanced model performance in the extrapolated domain.

www.nature.com/scientificreports/ geometry and distribution. These shortcomings can be addressed with finite element analysis (FEA) based homogenisation 10,11 , where multiple boundary value problems are solved on a representative volume element (RVE) using different load cases. Some variations of this conventional FEA approach 12,13 are developed to reduce the computational costs. Variational Asymptotic Method (VAM)-based homogenisation, for example, gives an effective material matrix using single finite element analysis without any post-processing in contrast to solving multiple cases along with equally demanding post-processing steps in the conventional approach. Still, the computational time and resources required are significant enough to slow down the search for better composite materials. Hence, active research is being pursued to combine computational micro-mechanics and data-driven artificial intelligence (AI) methods to build surrogate models 6,7,[14][15][16][17] . CNN models have been used widely in the micro-mechanics 15,16,[18][19][20][21] as micro-structure information is available, generally, either in image form (for 2D) or voxelated form (for 3D). The success of CNN architecture over simple artificial neural networks can be attributed to its self-feature learning capability and utilising local connectivity characteristics using the following two basic assumptions 22 . One, low-level features are assumed to be local and do not depend on the spatially far-off features, which is implemented by connecting downstream neurons with only spatially neighbouring neurons in the upstream through a kernel (or filter) of the convolution operation. The second assumption is that a feature learnt at one spatial location is useful at the other location. Hence the kernel with the same weights is used at all locations of the image. Generally, CNN models are constructed in two stages. First, data features are learnt through a series of convolution and pooling operations on the input samples. The second stage contains a conventional multi-layer perceptron which takes the output of the first stage as a flattened array. Dense connections of the last stage increase the number of learnable parameters drastically, thus leading to heavier computation costs and longer training times. Hence, Mann and Kaidindi 20 have developed a CNN model wherein the output of the first stage is directly mapped to outputs. Also, at the end of the first stage, using globally averaged pooling instead of simple flattening was proved to reduce the number of parameters and over-fitting in the model 18,23 . Innovative architectures of the first stage have led to efficient CNN models like AlexNet, VGG, and ResNet. Among these, the VGG model has been adopted widely in many micro-mechanical models 18,19,24 , either directly by transfer learning or using its principle of stacking convolutional layers with delayed pooling operations. For example, Li et al. 19 used pruned VGG-16 model for learning and reconstructing micro-structure features wherein high-level layers or those away from the input layer are removed to reduce the computational cost. We have used the working principle of this simple and standard architecture because the primary focus of the present work is to develop data sets that are aware of the material information and to evaluate its influence on the model performance. Though CNN models are free of feature engineering, some models have demonstrated that by supplying modified input instead of simple raw images, model learning capability can be enhanced 17,20,25 . For example, Mann and Kalidindi 20 used two-point spatial correlations of the micro-structure; Cheng and Wagner 17 have developed RVE-net which uses loading conditions and parameterised geometry (by level set fields) as the input. As the preparation of labels is computationally intensive, some CNN models have developed using physics information to learn labels implicitly 17,26 . Li and Chen 26 have modelled the constitutive behaviour of the hyper-elastic materials by embedding equilibrium conditions in the CNN model.
In the case of composite materials, it is desirable to have a surrogate model that can be used across more comprehensive ranges of fibre volume fractions ( V f ) and constituent properties. The existing models are built for either a particular fibre volume fraction or a small range of fibre volume fractions (less than 50%) and a particular fibre-matrix materials combination. In this work, we develop a model that can be used for broader ranges of fibre volume fractions V f ∈ [25%, 75%] and fibre-matrix elastic modulus contrast (ratio) E cr ∈ [5,250] and also the predictive capabilities of the trained models is assessed in the extrapolated domain of V f ∈ [10%, 75%] and E cr ∈ [5,500] . Grayscale images of the microstructure provide the geometrical features like V f but not the material information. So, if the model has to work with different material systems, it should learn to detect constituent material properties. For this purpose, a simple and novel method is developed wherein material information is supplied as higher-order tensors which are prepared by encoding material properties of each phase into a grayscale image of the microstructure. Another alternative way of ingesting the constituent properties is through multi-modal or mixed inputs. In this approach, numerical values of the constituent properties can be concatenated to the flattened array after the convolution operation, avoiding the encoding operation 27 . However, this approach might require more samples to learn the spatial location of the material properties, whereas the samples prepared from direct encoding are informed about the constituent's spatial location. Also, the physical admissibility of the model predictions is assessed using physics-based bounds. Despite the acceptable levels of performance metrics, a significant number of outliers to bounds are observed in certain regions of the domain. These outliers are completely eliminated by training the models with hard enforcement of bounds. For this purpose, we have used the Hashin-Shtrikman bounds 28,29 in model training.
The paper is structured as follows: initially, datasets generation is explained with the details of microstructure generation, material property encoding and label preparation. Then, CNN models are built, and their performance is studied on the unseen samples of the training data sets domain and their extrapolated domain using absolute percentage error plots. In the end, the physics-based bounds are used to quantify and eliminate the physically inadmissible model predictions.

Data sets generation
The dataset is constituted of a stack of RVE samples wherein each sample contains the binary image of the RVE as input and its normalised transverse elastic properties as target labels. Here, RVE is a representative volume element of the unidirectional composite with randomly distributed fibres of circular cross-sections. Let X bw ∈ R n s ×n w ×n h ×1 be the input part of the dataset containing n s number of RVE images with n w and n h pixels along width and height, respectively. Along with X bw , one needs to supply material properties of the www.nature.com/scientificreports/ constituents, which will be encoded into the RVE image at their respective spatial locations, as explained in the material information array preparation section. At the end of this pre-processing step, we get a higher-order tensor X ∈ R n s ×n w ×n h ×n m containing n m layers for each image representing different properties of interest. The input ( X bw ), material information arrays ( X ) and labels ( Y ) of the dataset are shown schematically in Fig. 1.
In order to develop a generic surrogate model that is applicable to broader practical applications, data sets are created with a wide range of fibre volume fractions (V f ∈ [25%, 75%]) and constituent material property contrasts ( E cr = E f /E m ∈ [5,250] ). For a given V f , from Adam and Doner 30 observations, transverse elastic properties of unidirectional composites rapidly increase at lower fibre-matrix elastic modulus contrast E cr = E f /E m and then stabilises; this phenomenon becomes more pronounced at higher V f . Transverse elastic modulus is found to stabilise at about E cr = 250 for V f = 75% 30 so the maximum E cr selected as 250 in this study. For each RVE, the fibre volume fraction ( V f ) and material properties ( E f and E m ) are drawn randomly with uniform probability from their respective range. If the randomly chosen E f and E m are such that E cr is outside the range, a new pair is drawn until the E cr is within its selected range. The scatter plot of V f and E cr for 30,000 RVEs is shown in Fig. 2a. One can notice that the samples are spread uniformly with respect to fibre volume fraction but non-uniformly with respect to E cr . This is due to wider range of E f ∈ [10 GPa, 500 GPa] compared to E m ∈ [1 GPa, 10 GPa] with a constraint on the E cr range. For a given V f , from Adam and Doner 30 and Fig. 2b, transverse elastic property varies rapidly at lower E cr and stabilises at higher E cr . Hence, we assume that having lesser samples in the region of negligible property variation has minor effect on the model performance.
The data set, D 1 , developed in this work contains 30,000 samples with input X bw ∈ N 30,000×256×256×1 and labels Y ∈ R 30,000×3 , which will be split in a 2:1 ratio for training and testing performance of the models, respectively. Here, the size of the RVE binary image (i.e., representing matrix with 0 and fibre with 1) is chosen as 256 × 256 following a convergence study, as explained in the next section.
Note that the dataset is designed as a union of 120 chunks wherein each chunk containing 250 samples follows the identical distribution (of V f and E cr ) as the whole dataset. This is to ensure the identical distribution for the smaller data sets which will be used in convergence studies for finding the optimum image size and optimum training set size. The following points list the steps involved in data sets preparation while the detailed procedure is given in the later part of this section, For each RVE, 1. Draw V f and E cr from the selected range; 2. Generate RVE for the respective fibre volume fraction, V f ; 3. Save RVE as a black and white binary image, representing matrix with 0 and fibre with 1; 4. Material information arrays are prepared using Eq. (4), from binary image during the prediction; 5. The transverse elastic properties are determined using physics-based simulations and normalised with their respective matrix modulus.

Microstructural RVE generation.
Periodic RVEs of unidirectional composite materials, with the random distribution of circular fibres, are generated using an optimisation-based algorithm recently developed by the authors 31 . Here, the periodicity of RVE implies a fibre leaving an edge(s) must enter from the opposite edge(s) such that RVE is continuous when it is repeated in the space as shown in Fig. 3a. Such periodicity is necessary for applying periodic boundary conditions during the homogenisation of the RVE to evaluate effective properties. RVEs generated using this algorithm have proved randomness in the fibre distribution and transverse isotropy as an actual microstructure using statistical and micro-mechanical analysis 31 . Initially, fibre cross-section centres  www.nature.com/scientificreports/ x = (x, y) are placed randomly in the RVE domain while allowing fibre overlaps. Then, a constrained optimisation problem is solved to minimise the magnitude of fibre overlap f as shown in Eq. (1).
The total magnitude of overlap f and its gradient can be explicitly evaluated 31 as shown in Eq. (2) where C ij is the magnitude of i-th fibre intrusion into the j-th fibre, H is Heavside step function, d ij is the actual distance of between the centres of fibre i and j, d ij is the distance between the centres of fibres when they are externally touching with each other, and N is the total number of fibres in the RVE. We have used Julia language 32 to solve the optimization problem Eq. (1). On a computer with an Intel Xeon CPU 2.40 GHz processor and 64GB RAM, generating 30,000 RVEs with uniform distribution of V f ∈ [25%, 75%] took 106.8 min. The computational time might vary slightly due to the stochastic nature of V f and the optimisation convergence for each RVE. Four sample RVE images, generated using this approach, are shown with 256 × 256 resolution in Fig. 3.

Material information array preparation.
In this section, the procedure for creating material information arrays from an RVE image is developed. Let the array I (g) ∈ R n w ×n h represent a grayscale image of RVE with N ph material phases where a unique pixel value, In order to avoid confusion with the continuous phase matrix of the micro-structure, the term array is used to imply a mathematical matrix or, more specifically, rectangular arrangement of image pixel values. We proceed to construct I ( ) , of same size as I (g) but with different pixel values representing material constant or property ∈ [ min , max ] . The pixel values of I ( ) can be evaluated using the Eq. (3). Here, the criteria for choosing the bounds, min and max , need not be based on the admissibility of the property but rather the range of values used for building the data sets. For example, from Table 1, elastic moduli limits can be chosen as E min = 1 GPa and E max = 500 GPa instead of E > 0 for creating all the data sets.
where δ(x) is the Dirac delta function with value 1 for x = 0 and 0 otherwise. Though Eq. (3) looks involved, it simply normalises the property i of i-th phase with respect to its bounds to [0, 1].
In the special case of a two-phase material, the Eq. (3) can be simplified to the Eq. (4). Let the phase 1 and the phase 2 of I (g) are represented with pixel values 0 and 1, respectively. Then the whole array, I ( ) , representing the information 1 for phase 1 and 2 for phase 2 , can be obtained using the following Eq. (4).
where J ∈ R n w ×n h is an array of all ones. A schematic of the elastic modulus information array, evaluated using Eq. (4), is shown in Fig. 4. It's worth emphasising that caution must be exercised while saving the material information arrays in image format. Pixel values are generally stored as a byte (8-bit), taking the integer values in [0, 255]. This might lead to 256 discrete divisions on the selected range of the material property instead of continuous values, as float values are rounded off to integers. To avoid this trouble, we chose to evaluate material information arrays during the model prediction in the pre-processing stage of the model, as shown in Fig. 5, at the cost of a slight increase in the computational cost.
In the present work, Poisson's ratio of fibre and matrix is chosen as the same, ν f = ν m = 0.25, to reduce the complexity of the analysis. However, this assumption is justified due to the weak dependency of Poisson's ratio mismatch on the transverse elastic properties 33,34 . Hence, Poisson's ratio information arrays are not included in the input so each sample contains only the elastic modulus information array.
Target properties evaluation. Target values of the data sets contain the transverse elastic properties E 22 , E 33 and G 23 , normalised with the respective matrix modulus. As the number of RVEs (30,000) is relatively larger, a computationally efficient homogenisation technique based on the variational asymptotic method (VAM) 13 is used in this work. In this approach, the whole effective elastic matrix D can be evaluated using a single simulation using the Eq. (5a) 13,35,36 (1) www.nature.com/scientificreports/ where is the volume of the RVE domain; D is the material stiffness matrix of the respective phase with size p × p ; B is a strain-displacement matrix, and n a is the number of total active degrees of freedom (i.e., excluding the dependent degrees of freedom due to periodic boundary conditions). A homogenisation tool, written in Julia 32 language, is developed to evaluate the effective material matrix D shown in Eq. (5). Note that VAMbased homogenisation also uses FEA for evaluating the terms in Eq. (5b), thus making it capable of capturing the RVE morphology and ensuring the high-fidelity of the solutions. In contrast to the conventional FEA-based implementations 10,11 , where one needs to solve as many boundary value problems (BVP) and post-processing steps as the number of material matrix columns, VAM-based homogenisation gives D with a single BVP solution.
For example, on a computer with an Intel Xeon CPU 2.40 GHz processor and 64 GB RAM, two-dimensional homogenisation of 20 RVEs using plane strain analysis took about 8.3 min with VAM and about 32.5 min with the conventional FEA approach with the same mesh and loading. This gain in terms of computational time becomes more significant in the case of three-dimensional homogenisation. Generated RVEs are modelled with a perfect interface between the fibre and the matrix. Then, the periodic mesh needed for applying periodic boundary conditions (PBC) is generated, with plane-strain elements, using an open source software, gmsh 37 . Then Eq. (5) is employed to find the transverse effective material matrix D . Mesh convergence study, performed at four combinations of the extremes of V f ∈ [25%, 75%] and E cr ∈ [5,250] ranges, has shown that the convergence of transverse elastic properties at about 50-60 thousand elements. Mesh contains a large proportion of quadrilateral elements and triangular elements in smaller proportion ( < 2% ). Next, the optimum RVE size (the ratio of RVE side length to the fibre radius) is determined as 30 following another transverse elastic property convergence study by varying the RVE size.    18,19,24 . The advantage of using a smaller kernel size with increased depth (or more layers) over a big one is to reduce the number of training parameters and probably enhance learning capability as the non-linear activation function is applied more times through the depth. Also, delayed pooling operation minimises information loss. Hence, in the present work, we have adopted the VGG kind of CNN architecture for building the model as shown in Fig. 5. In all the convolution layers, kernel size and stride are fixed to (3,3) and (1, 1), while the number of filters is shown in Fig. 5 for each convolution operation. Average pooling is chosen with a size of (2, 2) and a stride of (2, 2), following a comparative study with max pooling operation. Activation functions are essential elements in the deep learning model to infuse non-linearity. So, rectified linear unit (relu) activation is applied after every convolution layer. As the model is built to predict continuous real values, linear activation (or no activation) is used on the output layer. Note that as the data sets are too large to fit into the memory,  ij are true and predicted properties of a sample. Then, the Adam optimisation algorithm 39 is used, with a learning rate of 0.001, for updating the model weights such that the MSE is minimised. These steps are implemented using PyTorch 40 , an open-source deep-learning library with the Python programming interface, for building and training the CNN model. Training a model with the aforementioned hyper-parameters and ten thousand samples took about 80 min on a machine with 32 GB RAM, 3.7 GHz processor and 8 GB NVIDIA GPU RTX-3050.
RVE image size selection. The computational cost of the model training and inference is directly related to the image size. While a lower image size leads to cheaper computational demand, greedy downsampling of the image might severely alter the micro-structure details. Hence, in this section, we determine the appropriate RVE image size (hence that of material information arrays) by evaluating its influence on the model performance.
As the resolution of the image becomes lower, microstructural information may be lost due to pixelation. For example, the RVE of a sample with 54.7% fibre volume fraction is shown in Fig. 6a,b, respectively, with 128 × 128 and 512 × 512 resolution.
It can be noticed that, with 128 × 128 , the matrix between two fibre surfaces is replaced with fibre material, and the smooth profile of the fibre cross-section has become coarse. In this study, we consider five different resolutions ( 32 × 32 , 64 × 64 , 128 × 128 , 256 × 256 and 512 × 512 ) to understand the information loss and its influence on the model training. First, the absolute percentage deviation (APD) of fibre volume fraction due to pixelation of the image is quantified using the Eq. (7) and plotted in Fig. 6c. Here, V (image) f is evaluated as a fraction of white pixels (representing fibres) in the RVE image. It shows that, for example, saving an RVE at 64 × 64 resolution would lead to about 2-4% deviation in V is close to 75%. This deviation is found to reduce by increasing image resolution with less than 1% deviation for resolutions above 256 × 256 . But, selecting higher resolution causes exponentially increasing computational loads thus higher model training times.
Next, models are trained with all five considered resolutions at three different data set sizes (500, 1500, 2500). Further, at each combination of data set size and resolution, ten realisations of models are developed (with the same training samples and hyper-parameters) to account for the statistical nature of the training process. Then, the performance of these models is evaluated on the test samples and quantified with mean absolute percentage error (MAPE); In Fig. 6d, the mean of the MAPE evaluated over ten realisations is plotted against image resolutions with the standard deviation of MAPE as errorbars. It can be noticed that with increasing resolution and training set size, the MAPE and uncertainty have reduced.
From the above analysis, we have selected 256 × 256 image resolution for model training as the reduction in V f deviation (see Fig. 6c) and MAPE (see Fig. 6d) is not significant with an increase in image size from 256 to 512, compared to the increased computational cost.
Performance evaluation. In order to find the optimum number of samples required for effective learning, different models are trained with the number of samples n s ∈ {500 , 1000, 1500, 2000, 4000, 6000, 8000, 10,000, 15,000, 20,000} . As explained in the previous section, these subsets of the data set are ensured to have the same kind of distribution as that of the whole data set. Further, to understand the statistical nature of the training process, 10 different realisations of the same model are trained at each of the n s using the same set of samples and hyper-parameters. So, in total, 100 models are trained with ten subsets of the data set and 10 realisations at each of the subsets. Then, these trained models are tested on samples not seen during the training wherein the test set size is selected as half the size of the training set. In other words, for example, models trained on 5000 samples are tested using 2500 unseen samples. Mean absolute percentage error (MAPE), as defined in Eq. (8), is used to measure the predictive capability of the trained model.
where n test is the number of test samples, and the superscripts t and p indicate true and predicted values of y. Though MAPE is simpler to interpret and scale independent, it has certain limitations like tending to infinity or undefined when the true value approaches or equals to zero. However, in the present work, normalisation of the effective properties with the respective matrix modulus eliminates such trouble as true or target values y (t) i are always greater than or equal to one. Also, it is important to note that the absolute percentage error treats underestimation and overestimation differently.
The variation of the mean and standard deviation of MAPE, evaluated on the test set over 10 realisations, is plotted against the number of training examples in Fig. 7. We refer to these curves as learning convergence curves (LCC). In Fig. 7, one can observe that MAPE of all three normalised transverse properties ( E 22 , E 33 , G 23 ) has converged at about a training set of 10,000 samples. Also, as indicated by the error bars, the standard deviation www.nature.com/scientificreports/ has reduced significantly with the training set size. From this convergence analysis, we have selected training set size of 10000 as optimum and proceed to rigorously analyse the models trained with this data set size. The transverse elastic properties (i.e., target properties) depend on the fibre volume fraction V f and elastic modulus contrast E cr , as shown in Fig. 2. It is difficult to infer model performance with respect to these parameters using MAPE, as it squashes information at all V f or all E cr into a single value, see Eq. (8). So, in order to get a clear understanding of the model's predictive capability, the absolute percentage error (APE) of each prediction will be studied. In Fig. 8, scatter plots show the APE of all three property predictions for 5000 test samples with respect to V f and E cr . It can be noticed that, except few outliers, the absolute percentage error lies below 5%. The cumulative distribution function on the right side of Fig. 8 shows the fraction of samples below www.nature.com/scientificreports/ a particular APE. For example, 86% of samples have absolute prediction error less than 3% and below 5% APE for 97% of the test samples.
Extraterritorial performance. In the preceding sections, the surrogate model is built and trained to predict in a wide range of V f ∈ [25%, 75%] and E cr = E f /E m in [5,250] . Also, these models are tested on unseen samples which belong to the same range, and the performance is found to be within acceptable levels. It would be interesting to see how the model performs in the extrapolated domain which was not considered during the training. In Fig. 9, extrapolated domains of data sets ( D 2 , D 3 and D 4 ) with respect to the domain of main data set D 1 are shown schematically. In these extrapolated domains, the variation in the property is not significant from its connecting region of the native domain, as shown in the middle and right schematic of Fig. 9. So, the model is expected to predict with reasonably good accuracy as in the native domain. Importantly, such an exercise will help in evaluating the generality of the CNN model and its ability to predict properties of completely unseen microstructures whose characteristics are not present in the training dataset. For testing the model's perfor-  www.nature.com/scientificreports/ mance in these extraterrestrial domains, the size of data sets is selected in proportion to that of the domain size.
As the range of E cr is approximately the same for all the domains, the number of test samples is calculated based on the V f range. For data sets D 1 and D 2 , with 50% V f range, 5000 test samples are used, and for the remaining two data sets which have 15% V f range, 1500 test samples are used. The APE of the model predictions on these data sets is shown in Fig. 10, with respect to V f and E cr , along with the cumulative distribution function of APE.
In the case of D 3 and D 4 , as shown in Fig. 10b,c, APE shows an increasing trend with decreasing V f . This could be due to deviation in structural information of RVE with decreasing V f , though its target property is not changing significantly. In all three extrapolated domains, the APE of model predictions for at least 85-90% of the test samples is less than 5%. This suggests that the trained model can be used in the extraterritorial domain of V f and E cr .

Influence of physics-based bounds.
In the previous sections, we analysed model performance on the unseen samples of the trained data set domain and on the data sets of extrapolated domains. It is observed that the absolute percentage error of the predictions is within the acceptable limits. However, model predictions may or may not be physically admissible. In this section, the admissibility of these predictions is assessed using the physics-based bounds available in the literature 29 . We use simpler and relatively tighter Hashin-Shtrikman (HS) bounds 28 , which can be evaluated using the Eq. (10). In general, the lower and upper bounds on the effective properties of the composite material are separated by a large magnitude, as shown in Fig. 11a. It can be noticed that the bounds get wider with increasing V f and contrast ratio E cr . And, the transverse properties lie closer to the lower bound ( as shown in Fig. 11b,c), thus there is a possibility that the model prediction might go out of the lower bound.
where suffix f and m refer to fibre and matrix, K is the bulk modulus, G is shear modulus, E is Young's modulus, super-fix (−) and (+) indicate lower and upper bounds. The number of outliers to HS lower bounds is evaluated on all 10 realisations of the model, which are trained on 10,000 samples of the data set D 1 . The maximum number of outliers for each property with all four data sets is listed in Table 2.
It shows that a large number of model predictions on the data sets D 3 and D 4 are below the lower bound. Now we proceed to enforce these bounds during the model training such that all the model predictions fall within the bounds. While training a model, in general, bounds can be enforced in two ways. In the first approach, known as soft enforcement, the loss function of the model is regularised by weighed addition of the mean square errors of the deviation of the predictions from the bounds. Generally, the weights of these additional loss terms are hyper-parameters which need to be tuned manually. In the second approach, known as hard enforcement, the model predictions are transformed to lie within the bounds thereby avoiding additional hyper-parameters. In the present work, we chose to enforce bounds in a hard manner. In this approach, model architecture and training are similar to the one shown in Fig. 5, except few changes at the end of the network. The output of the network's last layer is mapped to [−1, 1] by applying the tanh activation function. Then, these values are further scaled to (9)    In each of (a-c) subplots, first two scatter plots show the APE of all three properties with respect to fibre volume fraction V f and elastic moduli contrast E cr . The cumulative distribution function of APE is shown on the right hand side. www.nature.com/scientificreports/ where y * ∈ [−1, 1] is the output of tanh activation function on the last layer, y (−) and y (+) are lower and upper bounds. It is observed that, unlike without bounds training, the training with bounds is sensitive to the learning rate; bounds-enforced models are trained with an optimal learning rate of 0.0005. The overall MAPE of the model predictions, after 200 epochs, is about 1.72 in the same range as with models trained without bounds (see Table 1). Nevertheless, the absolute percentage error of the predictions in the extrapolated domains D 3 and D 4 is improved, as shown in Fig. 12, in addition to eliminating the number of outliers, for all the domains. It suggests that, for predictions in the extrapolated domain, especially towards the lower fibre volume fractions, enforcing the bounds is important in predicting physically valid properties.

Conclusions
CNN models are developed for predicting the normalised transverse elastic properties of fibre-reinforced composites. In order to increase the applicability of the model, it is trained on a wide range of fibre volume fractions in [25%, 75%] and fibre-matrix elastic modulus contrast ratio in [5,250]. The model is shown to provide very good predictions even on completely unseen microstructures that lie outside of the considered range of volume fractions (in [10%, 25%]) and modulus ratios (in [250, 500]). Further, the study demonstrated that careful data set preparation and training design is crucial for achieving better model performance. In summary, • A simple and novel method is developed for encoding material properties of constituents in the greyscale image of the micro-structure so that the model learns material information along with the geometric information. • RVE binary image with a resolution of 256 × 256 is found to have minimum V f deviation ( < 1% ) from true V f ; Also, MAPE is found to have converged at this RVE image resolution. • Stochastic nature of the training process is quantified using the mean and standard deviation of MAPE, evaluated on 10 realisations of the model training. • Using the learning convergence curves, the optimum training set size is determined as ten thousand beyond which reduction in MAPE of model predictions is found to be negligible. • In the training set domain, at least 96% of the 5000 test sample predictions have absolute percentage error (APE) less than 5%. • In the case of extrapolated domains, at least about 85-90% of the test samples have APE less than 5%.
• At the end, we have trained the models with hard enforcement of the physics-based HS bounds such that the model predictions are always physically admissible. Also, this has improved the model's performance metric APE in the extrapolated domains D 3 and D 4 .  www.nature.com/scientificreports/ The proposed material encoding idea can be employed to build surrogate models for heterogeneous, anisotropic materials of varied constituent combinations by using the stack of relevant material information arrays In (a-d), first two scatter plots indicate the APE of model predictions with respect to fibre volume fraction V f and elastic moduli contrast E cr . On the right-hand side, the cumulative distribution function of APE shows the fraction of samples below 3% APE and 5% APE. www.nature.com/scientificreports/ as input to the network. Also, as the model spans a wide range of fibre volume fractions and elastic modulus contrasts, the trained models can be used in the inverse design of the microstructures which gives the properties of interest.

Data availibility
The datasets used and/or analysed during the current study are available at the following link https:// github. com/ 338ra jesh/ mpi-cnn.